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ABSTRACT 

Analysis of variance (ANOVA) was invented in the 
1920s to partition variance of a single dependent variable into 
uncorrelated parts. Having uncorrelated parts makes the computations 
involved in ANOVA incredibly easier. This was important before 
computers were invented, when calculations were all done by hand, and 
also were done repeatedly to check for calculation errors. This paper 
demonstrates that ANOVA effects in a balanced design are perfectly 
uncorrelated. A mathematical proof that the four sums-of-squares 
(SOS) partitions (two main effect, one two-way interaction, and 
error) for a factorial two-way design are all uncorrelated, i.e., sum 
exactly to the SOS of the dependent variable is presented, and a 
small heuristic data set is included in an appendix to illustrate the 
proof. (Contains 71 references.) (Author/SLD) 
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Fisher invented ANOVA in the 1920's to partition variance of a single 
dependent variable into uncorrelated parts. Having uncorrelated parts makes the 
computations involved in ANOVA incredibly easier. This was important before 
computers were invented, when calculations were all done by hand, and also were 
done repeatedly, to check for calculation errors. 

The present paper demonstrates diat ANOVA effects in a balanced design 
are perfectly uncorrelated. A mathematical proof that the 4 sums-of-squares 
partitions (2 main effect, 1 two-way interaction, and error) for a factorial two-way 
design are ail uncorrelated, i.e., sum exactly to the SOS of the dependent variable 
is presented and a small heuristic data set is included to illustrate the proof 



Fisher invented ANOVA in the 1920's to partition variance of a single dependent variable into 

uncorrelated parts. Having uncorrelated parts makes the computations involved in ANOVA incredibly 

easier. This was important before computers were invented, when calculations were all done by hand, 

and also were done repeatedly, to check for calculation errors. 

The present paper demonstrates that ANOVA effects in a balanced design are perfectly 

uncorrelated. A mathematical proof that the 4 sums-of-squares partitions (2 main effect, 1 two-way 

interaction, and error) for a factorial two-way design are all uncorrelated, i.e., sum exactly to the SOS 

of the dependent variable is presented and a small heuristic data set is included to illustrate the proof 

Let A be the independent variable with levels l,..j,.. .a and subjects l,...,i,...,n. For the one 

th 

factor case, we can describe the influences responsible for the performance of the / subject in the 
•th *th 

j treatment group by writing the ij response in terms of the sum of (the overall mean performance 

'th 

of all subjects) and (the difference between the j treatment mean and the overall mean) and (the 

th 

unexplained component of the / subject's score). 

The statistical model for the one factor completely randomized design with fixed effects is given 
by 

th 

which completely accounts for the ij response (Kennedy & Bush, 1985). 

For ease of notation, let a j ft) and Gy = ~ /^>)» allowing us to rewrite the 

model as 

( 1 ) Xjj=fj + aj+€ij. 

Let X..=p. -> /i and X.j Mj be the least-squares estimators of the 

population parameters in the above model. (Note that * indicates that the subscript varies over all 
cases whereas the explicit subscript remains fixed. For example, x.j is the mean of the jth treatment 

group over all subjects l,...,n.) Thus, the working model is given as 
(2) Xjj = X.. + ( J.y -X.) + (Xjj - X.J ) 




1 



4 



where OCj — - JC.. j and eij — {xy — x.j). Note that a j denotes the effect for the 

a 

level of the independent variable A and that ^CCj =0. (The assumption of fixed effects is important 

J 

for this result). Also, recall that ey n A^,/D,|o,cr^ j . 

To generalize to the two-factor case, let A and B be two independent variables with levels 
l,...j,...,a for A, levels l,...,k,...b for B, and subjects l,...,i,...,n. Note that the set of all values 

Xyk, for all and k = \,...,b 

can be thought of as a vector with n- a -b entries. For example, suppose n = 2, a = 2, and b = 3. 
Then this vector has 2 • 2 • 3 = 1 2 entries. We write X = [xy^ j to stand for the vector having 

n-a-b entries. This is an n x a x b vector, commonly called a tensor , (a tensor can be 
conceptualized as a 3 -dimensional matrix), whose mean is given by 

a b 

z 

i=\k=\ ' 

th 

We can now describe and completely account for the ijk response in a similar manner to 
•th 

the one-factor case ij response by generalizing the statistical model in equation (1) to 
(1)* Xyk = /i + CC j + Pk + CCP jk + €yk ■ 



For ease of notation in writing down the generalized model, we will use the following shorthand: 
nab 

; /=1 j j=\ k k=\ 

It follows that 

n a n b a b nab 

Z = ZZ. Z = ZZ.Z = ZZ. I. =zzz . 

ij i=\j=\ ik ;=1A: = 1 jk j=\k=\ ijk /=l/=lit=l 







nab 1 



n 

z 



;•=] 




2 



5 



The least-squares type estimators are now ^ven as 



(a) (b) x.j.=-^'Zxjjk -^^j, 



ijk 



bn . 



ik 



_ 1 _ _ 1 V- 

(c) x..k ~ ^Xjjk — > (d) x.jk =— ^Xjjk Mjk 



an 



n 



Using these estimators, we can define the components of (1)* as 

Pk^{x,.k -x...\ apjk =[x.jk-x.j.-x..k +x...y 

^ijk ~ {^ijk ~ X. jk ) • 

Using the estimator X... for jj, and subtracting it fi'om both sides of equation (1)*, we have 



(*) 



Xyk X... — OCj + Pk "b jk + Sijk ■ 



We will use equation (*) and the least-squar is type estimators (a) - (d) to prove the following claim. 

Claim: SS(Total) = SS(A) + SS(B) + SS(AB) + SS(Error) where all the SS terms are 

uncorrelated. 

Proof If we square both sides of (*) and sum over all ijk, we have 



{**) 



X ^Xjjk X..) — X j Pk ^P jk ^ijk ) 

ijk ijk 
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= S + S + Z {ocPjkf + Z ^ijk^ 

ijk ijk ijk ijk 



(mixed terms) 



+ 



Z« jPk + Z« jCtPjk + Z« j^ijk + Y^PkctPjk 

ijk ijk ijk ijk 

+ ILPk^ijk + Y.<^Pjk^ijk 

ijk ijk 



It is important to note at this point that if it can be shown that each of CL, and £ 

is zero, then each of the mixed terms in (**) represents the covariance of two tensors. In general, the 
covariance of two tensors U and V is given by 



cov(U,K) = 5;(«i/*-UXv//*-K). 
ijk 



Thus, it will suffics to show that each of a, yff, <ary9, and £ is zero and that the mixed terms 

above is each equal to zero, since U and V are uncorrelated (perpendicular or orthogonal) if and only 
if 



COv((7,F) = 0. 



Subclaim 1: Each of a, and £ is zero. 



I. Consider a . 



a 



1 






■ iS'j -'-) 



1 V- 1 

abnijk abn^jk 
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II. Consider p. 
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^ nb'Y.x.j. — ]—abn x... 



abn 



abn 



-'Zx.j. - X.. 

^ j 



X... -x.. 






abriij^ 









abn i 



ijk 



1 V- ^ V- 

obnijk abniji, 



^ naY,x..k — ]—abn x.. 



abn 



abn 



tEx-* - X... 



X... -X.. 



III. Cons' der £. 



0. 



1 



ob»ijk 






, ^^ijk , 
abnijk abnyi^ 



- 1 V- 

--j-Lx.jk 

abji^ 



X... -X.. 



0. 



IV. Consider afi. 



aP 



-\-'Lo‘Pjk 

abny^ 






ijk 



abriii? 



abn 



ijk 



ijk 



abn 



ijk 



‘•bn ijk 



^'^•‘ jk - -Z-x> - rZ-^-k + ^■ 

nbjk ' a j • i j 



X... -X... - X... +;c.. 



Thus, each of a, /?, CTyff, and e is zero and subclaim 1 is proved. 



Note that all of the possible mixed or combination terms of the four components 

= (^7 ~ ^ ^Pjk = (^. jk ~ j- - ^- k + . 
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and Eiji( ~ ^ jk) from equation (*) are represented in equation (**). 

Thus, since each of a, ft, aP, and s is zero, now showing each mixed term equal to zero will 
consequently show that 

a Lap, aLe, PLap, pLe, and apLe. 

Subclmm 2: Each of the 6 mixed terms in equation (**) isequal to 0. 

I. Consider the first mixed term of (**), ^CCjPj^. 



ijk 



nab 



n a { b \ 



Y.CLjPk 



I Z Y.c^jPk 



Z Z « j Y.Pk 



ijk 



/=ly=U=l 



i=ly=l V/fc=l ^ 




by definition of (*) 



/=V=1 U=1 ^ 



n a ( b -i n a ^ 

Z Z«7 H^ijk - X...) by (c) 

t = ly=l v^=l ^/ = ly=l J 




n a 



/=ly=l 



by (a) 



0. 
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II. Consider the second mixed term of (**), ^<x jafi . 

ijk 



ijk 



nab 



II Z l^fXjaPjk 

/• = ly = U = l 



n a 



b 



\ 



ZZ«y Y.(^Pjk 

/=ly=l V^=i J 



n a ( b 

ILlLcij iix.jk-x.j. 

i=lj=\ ^^=1 






-x..k + X...) 



by definition of (*) 



J 



n a 



b 



b 



= Z Z«y 

/=Iy=I 



Y, x.jk X X.j. 

^k=\ k=\ 



b _ * ^ 

Z x..k + z X... 

k = \ k=\ J 




n b \ n a b ^ 

Hxijk Z Z Z Xijk + bx... 

i=]k=l ^«/=ly=i^=i J 



by (d),(b),(c) 



n a 



1 



= Z Z« /• “ 7“ Y.bnx. j. - bx... + bx.. 

/=ly=l ^ ^ 



by (b) and (a) 



n a 



'L'Lccj 

/=i;=i 



bx.: - ±X.^ 



k=l ^ 




0 . 



III. Consider the third mixed term of (**), Z^y ^/yi 

ijk 



nab 



'Ll. I.aj^ijk 

/ = ly = l^ = l 



a b 



n 



\ 



Z Z«y 

j=\k=\ 



L^ijk 

\i=l J 



L^j^ijk 

ijk 






by definition of (*) 



a b 



n 



S ^(.^ijk ^-jk) 

j=\k = \ v/=l 



a b f n n ^ 

Z Z«y Z^//ifc - Z X.jk 

y=lA:=l V/=i /=i J 



a b _ 

Z Z^yK.yifc -^jk) (d) 

y=lA:=l 



0 . 



IV. Consider the fourth mixed term of (**), ^Pk^P jk 

ijk 

'LPk ^pjk = i i = 

ijk /=ly=lA: = l 



n b (a 

T.llPk 

i=\k = \ 



jk 

\j=\ 



n b 

IkYPk 

/=Lt=l 






ZC^yA: -^y- 

Vy=l 



-X..k +X...) 



by definition of (*) 



J 



n b 
<=i*=i 



a a a a 

Z x.jk - Z x.j. - I + X X, 

Vy=l y=l y=l J=\ j 



n b 

iiPk 

/=«=i 



^ \ a n \ a n b \ n a a ^ 

“ Z Z-^/yA ~ ^ Z Z Y^^ijk Z Z Z -^/yA: ^ 



w A 

HYPk 

/=lAr=l 






_ 1 a _ 

- ax... Y^onx..L + ax.. 

y=l 



by (c),(a), (c) 
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by (d),(b),(c) 
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n b 
/•=lyt = l 



a 



^■■k -• Y.x- k 

y=i . 



S %Pk{^- k -ctx-k) 

i=\k = \ 



0 . 



V. Consider the fifth mixed term of (**), ^Pk^ijk 

ijk 

'Z^Pk^ijk ~ X S ^Pk^ijk ~ 

ijk / = l7=l^=l 



a b ( n '' 

Z iPk i^ijk 

j=\k=\ v/=i y 



a b ( n ^ 

Z Y.Pk\ Y.{Xijk~X.jk) 

j=.\k = \ V=1 



by definition of (*) 



a b ( n n 

“ S lA* - I J.y* 

y=i*=i ,=i J 



a b , 

S Y.Pk[ 



nx 







by (d) 



0. 



VI. Consider the sixth mixed term of (**), ^OlP jk^ijk 

ijk 



Z^^ jk^ijk 
ijk 



nab 

ZZ ^^Pjk^ijk 

/•=ly=U=l 



Z 

j=\k=l 



( n \ 
11^ ijk 

V/=l J 



a b f n ^ 

Z Za^jk i(Xijk-X.jk) 

j=\k = l V=i > 



by definition of (*) 



a b 

I Y.c‘Pjk 



J=\k=\ 



^ n n ^ 

YjXjjk ~ Z x.jk 

V=1 /=1 ^ 
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a b . 

J=\k=\ 



nx. 



Jk 



-^■jk) 



by (d) 



0. 



Thus, since all mixed terms of (**) equal 0, 





ijk ijk ijk 



which is the mathematical equivalent of 



SS(Total) = SS(A) + SS(B) + SS(AB) + SS(Error), 



a, P, and e is zero. 



Consequently, since all possible covariance combinations equal 0, 



al-P, a Lap, aLe, fiLaP, PLe, and apLs . 



since each of 



Thus, the 4 sums-of-squares partitions (2 main effects, 1 two-way interaction, and error) for a 
completely randomized factorial two-way design with fixed eflFects are all uncorrelated. 



Appendix A 



This appendix consists of an example using a small heuristic data set and calculations illustrating how to 
work through the proof . 

Example : 18 students, 9 male and 9 female, are distributed randomly among 3 training conditions; 
cooperative learning, lecture and control. Let A be the independent variable representing gender and 
B be the independent variable representing training condition. This example represents a two-way (2 x 
3) balanced design where A has 2 levels and B has 3 levels. 

Let Y be the dependent variable representing grade/performance on a 10 pomt test over the chosen 
topic. The following table represents test scores as a function of training condition and gender. 





Training Condition 


Male 
0 = 1) 


Female 
0 = 2) 


(k=l) 


Cooperative Learning 


5 6 7 


8 8 9 


(k = 2) 


Lecture 


7 9 9 


4 5 6 


(k = 3) 


Control 


2 3 4 


2 3 6 



number of subjects per group by gender (n = 3) i=l,...,3 

A - gender (a = 2) 

B - training condition ( b = 3) k = 1, ..., 3 
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15 



\ n a b 1323 

= “7-Z Z = T^Z Z 't^ijk 

abn ,=i y=ifc=i 1 8 ,=i ;=iit=i 



= T^ZZf-*^;;! +->^//3)=:j^Z[(->^/ll + ->^/2l) + (->^/12 + ->^/22) + (->^/13 +->^/23)] 

lo,=ly=r 



is; 



j_ 

18 



(^lii + JTi 21 +->^112 +-*^122 ■^•*^113 +-*^ 123) ■'■(•*^211 ''■•^221 ''■•*^212 ‘^•*^222 '^•*^213 '^•*^ 223 ) 
■•■ (-*^311 ■•■ -*^321 ■•■ -*^312 + -*^322 ■•■ -*^313 ■•■ -*^323) 



= — [(5 + 8 + 7 + 4 + 2 + 2 ) + (6 + 8 + 9 + 5 + 3 + 3 ) + (7 + 9 + 9 + 6 + 4 + 6 )] 

1 8 



103 

18 



= 5.7222222 



13 3 

•^.y. =tZZ^//* 

^/•=ifc=i 



jc 1 = -i(52) = 5.7777778 
X 2 =1(51) = 5.6666667 



« = T^Z Z Z(^.y. - xj - Z[(j:i - X ,)+(j: 2 . - x )] 

lS/=iifc=iy=i 5o/=ljfc=l 

= — Z Z((5.7777778 - 5.7222222) + (5.6666667 - 5.722222)) 

18/=ijfc=i 

1 3 3 

— £ £(.0555556-.0555556) = 0 

lS/=iifc=i 



13 2 

Jf..* =7ZZ^y* 

o/=iy=i 



jj=i(5 + 6 + 74-8 + 8 + 9) = 1(43) = 7.166666667 
"6 6 

jC2=l(7 + 9 + 9 + 4 + 5 + 6) = 1(40) = 6.666666667 
■'6 6 

j j = 1(2 + 3 + 4 + 2 + 3 + 6) = 1(20) = 3.333333333 



13 
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3 2 f 3 ^ 

ZZ«y 

/=iy=i \k=i 



3 2 



II«y 

/=iy=i 



\1. 16666661 -5.12222222) 

+ (6.66666667-5.7222222) 

+ (3.33333333-5.72222222) 



Z Xa^(l.44444447+.94444447 - 2.38888887) 

/=iy=i 



ZZ«y(0) = 

,=iy=i 
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